Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 359392 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 35.6 MiB |
| Average record size in memory | 104.0 B |
Variable types
| NUM | 8 |
|---|---|
| CAT | 4 |
| DATE | 1 |
Cost of Trip is highly correlated with KM Travelled | High correlation |
KM Travelled is highly correlated with Cost of Trip | High correlation |
df_index has unique values | Unique |
Transaction ID has unique values | Unique |
Reproduction
| Analysis started | 2021-10-07 06:35:54.414614 |
|---|---|
| Analysis finished | 2021-10-07 06:37:08.106526 |
| Duration | 1 minute and 13.69 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
| Distinct | 359392 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 179695.5 |
|---|---|
| Minimum | 0 |
| Maximum | 359391 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 2.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 17969.55 |
| Q1 | 89847.75 |
| median | 179695.5 |
| Q3 | 269543.25 |
| 95-th percentile | 341421.45 |
| Maximum | 359391 |
| Range | 359391 |
| Interquartile range (IQR) | 179695.5 |
Descriptive statistics
| Standard deviation | 103747.6783 |
|---|---|
| Coefficient of variation (CV) | 0.5773526789 |
| Kurtosis | -1.2 |
| Mean | 179695.5 |
| Median Absolute Deviation (MAD) | 89848 |
| Skewness | 4.59274699e-17 |
| Sum | 6.458112514e+10 |
| Variance | 1.076358075e+10 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 122257 | 1 | < 0.1% | |
| 105881 | 1 | < 0.1% | |
| 103832 | 1 | < 0.1% | |
| 126359 | 1 | < 0.1% | |
| 124310 | 1 | < 0.1% | |
| 130453 | 1 | < 0.1% | |
| 128404 | 1 | < 0.1% | |
| 118163 | 1 | < 0.1% | |
| 116114 | 1 | < 0.1% | |
| Other values (359382) | 359382 | > 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 1 | 1 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 359391 | 1 | < 0.1% | |
| 359390 | 1 | < 0.1% | |
| 359389 | 1 | < 0.1% | |
| 359388 | 1 | < 0.1% | |
| 359387 | 1 | < 0.1% |
| Distinct | 359392 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10220761.19 |
|---|---|
| Minimum | 10000011 |
| Maximum | 10440107 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.7 MiB |
Quantile statistics
| Minimum | 10000011 |
|---|---|
| 5-th percentile | 10022854.55 |
| Q1 | 10110809.75 |
| median | 10221035.5 |
| Q3 | 10330937.25 |
| 95-th percentile | 10418091.45 |
| Maximum | 10440107 |
| Range | 440096 |
| Interquartile range (IQR) | 220127.5 |
Descriptive statistics
| Standard deviation | 126805.8037 |
|---|---|
| Coefficient of variation (CV) | 0.01240668884 |
| Kurtosis | -1.19892498 |
| Mean | 10220761.19 |
| Median Absolute Deviation (MAD) | 110064 |
| Skewness | 7.232656511e-05 |
| Sum | 3.673259804e+12 |
| Variance | 1.607971186e+10 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 10000403 | 1 | < 0.1% | |
| 10249510 | 1 | < 0.1% | |
| 10235183 | 1 | < 0.1% | |
| 10233134 | 1 | < 0.1% | |
| 10239277 | 1 | < 0.1% | |
| 10237228 | 1 | < 0.1% | |
| 10226987 | 1 | < 0.1% | |
| 10231081 | 1 | < 0.1% | |
| 10229032 | 1 | < 0.1% | |
| 10251559 | 1 | < 0.1% | |
| Other values (359382) | 359382 | > 99.9% |
| Value | Count | Frequency (%) | |
| 10000011 | 1 | < 0.1% | |
| 10000012 | 1 | < 0.1% | |
| 10000013 | 1 | < 0.1% | |
| 10000014 | 1 | < 0.1% | |
| 10000015 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 10440107 | 1 | < 0.1% | |
| 10440106 | 1 | < 0.1% | |
| 10440105 | 1 | < 0.1% | |
| 10440104 | 1 | < 0.1% | |
| 10440101 | 1 | < 0.1% |
Date of Travel
Date
| Distinct | 1095 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.7 MiB |
| Minimum | 2016-01-02 00:00:00 |
|---|---|
| Maximum | 2018-12-31 00:00:00 |
Histogram with fixed size bins (bins=50)
Company
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.7 MiB |
| Yellow Cab | |
|---|---|
| Pink Cab |
| Value | Count | Frequency (%) | |
| Yellow Cab | 274681 | 76.4% | |
| Pink Cab | 84711 | 23.6% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 9.528587169 |
| Min length | 8 |
City
Categorical
| Distinct | 19 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.7 MiB |
| NEW YORK NY | |
|---|---|
| CHICAGO IL | |
| LOS ANGELES CA | |
| WASHINGTON DC | |
| BOSTON MA | |
| Other values (14) |
| Value | Count | Frequency (%) | |
| NEW YORK NY | 99885 | 27.8% | |
| CHICAGO IL | 56625 | 15.8% | |
| LOS ANGELES CA | 48033 | 13.4% | |
| WASHINGTON DC | 43737 | 12.2% | |
| BOSTON MA | 29692 | 8.3% | |
| SAN DIEGO CA | 20488 | 5.7% | |
| SILICON VALLEY | 8519 | 2.4% | |
| SEATTLE WA | 7997 | 2.2% | |
| ATLANTA GA | 7557 | 2.1% | |
| DALLAS TX | 7017 | 2.0% | |
| Other values (9) | 29842 | 8.3% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 14 |
|---|---|
| Median length | 11 |
| Mean length | 11.29946409 |
| Min length | 8 |
| Distinct | 874 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 22.56725408 |
|---|---|
| Minimum | 1.9 |
| Maximum | 48 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.7 MiB |
Quantile statistics
| Minimum | 1.9 |
|---|---|
| 5-th percentile | 3.57 |
| Q1 | 12 |
| median | 22.44 |
| Q3 | 32.96 |
| 95-th percentile | 42 |
| Maximum | 48 |
| Range | 46.1 |
| Interquartile range (IQR) | 20.96 |
Descriptive statistics
| Standard deviation | 12.23352593 |
|---|---|
| Coefficient of variation (CV) | 0.5420919125 |
| Kurtosis | -1.126875356 |
| Mean | 22.56725408 |
| Median Absolute Deviation (MAD) | 10.45 |
| Skewness | 0.05577890774 |
| Sum | 8110490.58 |
| Variance | 149.6591566 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 33.6 | 1536 | 0.4% | |
| 24 | 1080 | 0.3% | |
| 22.8 | 1075 | 0.3% | |
| 35.7 | 1069 | 0.3% | |
| 16.8 | 1065 | 0.3% | |
| 37.44 | 1062 | 0.3% | |
| 39.6 | 1056 | 0.3% | |
| 28.08 | 972 | 0.3% | |
| 21.85 | 769 | 0.2% | |
| 18 | 754 | 0.2% | |
| Other values (864) | 348954 | 97.1% |
| Value | Count | Frequency (%) | |
| 1.9 | 339 | 0.1% | |
| 1.92 | 375 | 0.1% | |
| 1.94 | 329 | 0.1% | |
| 1.96 | 383 | 0.1% | |
| 1.98 | 374 | 0.1% |
| Value | Count | Frequency (%) | |
| 48 | 366 | 0.1% | |
| 47.6 | 381 | 0.1% | |
| 47.2 | 378 | 0.1% | |
| 46.8 | 737 | 0.2% | |
| 46.41 | 380 | 0.1% |
Price Charged
Real number (ℝ≥0)
| Distinct | 99176 |
|---|---|
| Distinct (%) | 27.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 423.4433113 |
|---|---|
| Minimum | 15.6 |
| Maximum | 2048.03 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.7 MiB |
Quantile statistics
| Minimum | 15.6 |
|---|---|
| 5-th percentile | 63.42 |
| Q1 | 206.4375 |
| median | 386.36 |
| Q3 | 583.66 |
| 95-th percentile | 944.89 |
| Maximum | 2048.03 |
| Range | 2032.43 |
| Interquartile range (IQR) | 377.2225 |
Descriptive statistics
| Standard deviation | 274.3789114 |
|---|---|
| Coefficient of variation (CV) | 0.6479708243 |
| Kurtosis | 0.7476354732 |
| Mean | 423.4433113 |
| Median Absolute Deviation (MAD) | 187.22 |
| Skewness | 0.8737614916 |
| Sum | 152182138.5 |
| Variance | 75283.78705 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 298.32 | 18 | < 0.1% | |
| 191.27 | 18 | < 0.1% | |
| 198.8 | 17 | < 0.1% | |
| 181.59 | 17 | < 0.1% | |
| 216.37 | 17 | < 0.1% | |
| 115.53 | 17 | < 0.1% | |
| 79.38 | 16 | < 0.1% | |
| 264.83 | 15 | < 0.1% | |
| 399.41 | 15 | < 0.1% | |
| 248.41 | 15 | < 0.1% | |
| Other values (99166) | 359227 | > 99.9% |
| Value | Count | Frequency (%) | |
| 15.6 | 1 | < 0.1% | |
| 15.75 | 1 | < 0.1% | |
| 16.38 | 1 | < 0.1% | |
| 16.53 | 1 | < 0.1% | |
| 16.76 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2048.03 | 1 | < 0.1% | |
| 2016.7 | 1 | < 0.1% | |
| 2013.95 | 1 | < 0.1% | |
| 1993.83 | 1 | < 0.1% | |
| 1981.05 | 1 | < 0.1% |
| Distinct | 16291 |
|---|---|
| Distinct (%) | 4.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 286.1901128 |
|---|---|
| Minimum | 19 |
| Maximum | 691.2 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.7 MiB |
Quantile statistics
| Minimum | 19 |
|---|---|
| 5-th percentile | 46.224 |
| Q1 | 151.2 |
| median | 282.48 |
| Q3 | 413.6832 |
| 95-th percentile | 544.3632 |
| Maximum | 691.2 |
| Range | 672.2 |
| Interquartile range (IQR) | 262.4832 |
Descriptive statistics
| Standard deviation | 157.9936612 |
|---|---|
| Coefficient of variation (CV) | 0.5520584188 |
| Kurtosis | -1.012232752 |
| Mean | 286.1901128 |
| Median Absolute Deviation (MAD) | 131.232 |
| Skewness | 0.1379580609 |
| Sum | 102854437 |
| Variance | 24961.99696 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 362.88 | 186 | 0.1% | |
| 479.808 | 184 | 0.1% | |
| 471.744 | 180 | 0.1% | |
| 205.632 | 178 | < 0.1% | |
| 411.264 | 166 | < 0.1% | |
| 336.96 | 166 | < 0.1% | |
| 428.4 | 164 | < 0.1% | |
| 241.92 | 161 | < 0.1% | |
| 423.36 | 161 | < 0.1% | |
| 443.52 | 160 | < 0.1% | |
| Other values (16281) | 357686 | 99.5% |
| Value | Count | Frequency (%) | |
| 19 | 2 | < 0.1% | |
| 19.19 | 4 | < 0.1% | |
| 19.2 | 4 | < 0.1% | |
| 19.38 | 2 | < 0.1% | |
| 19.392 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 691.2 | 9 | < 0.1% | |
| 685.44 | 29 | < 0.1% | |
| 679.728 | 14 | < 0.1% | |
| 679.68 | 33 | < 0.1% | |
| 674.016 | 34 | < 0.1% |
Customer ID
Real number (ℝ≥0)
| Distinct | 46148 |
|---|---|
| Distinct (%) | 12.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 19191.65212 |
|---|---|
| Minimum | 1 |
| Maximum | 60000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.7 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 544 |
| Q1 | 2705 |
| median | 7459 |
| Q3 | 36078 |
| 95-th percentile | 58189 |
| Maximum | 60000 |
| Range | 59999 |
| Interquartile range (IQR) | 33373 |
Descriptive statistics
| Standard deviation | 21012.41246 |
|---|---|
| Coefficient of variation (CV) | 1.094872517 |
| Kurtosis | -0.885062488 |
| Mean | 19191.65212 |
| Median Absolute Deviation (MAD) | 6362 |
| Skewness | 0.880030242 |
| Sum | 6897326237 |
| Variance | 441521477.5 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 494 | 54 | < 0.1% | |
| 2939 | 53 | < 0.1% | |
| 2766 | 51 | < 0.1% | |
| 1070 | 51 | < 0.1% | |
| 126 | 50 | < 0.1% | |
| 944 | 50 | < 0.1% | |
| 858 | 50 | < 0.1% | |
| 1803 | 50 | < 0.1% | |
| 1067 | 50 | < 0.1% | |
| 1628 | 50 | < 0.1% | |
| Other values (46138) | 358883 | 99.9% |
| Value | Count | Frequency (%) | |
| 1 | 29 | < 0.1% | |
| 2 | 40 | < 0.1% | |
| 3 | 46 | < 0.1% | |
| 4 | 26 | < 0.1% | |
| 5 | 31 | < 0.1% |
| Value | Count | Frequency (%) | |
| 60000 | 18 | < 0.1% | |
| 59999 | 8 | < 0.1% | |
| 59998 | 9 | < 0.1% | |
| 59997 | 10 | < 0.1% | |
| 59996 | 4 | < 0.1% |
Payment_Mode
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.7 MiB |
| Card | |
|---|---|
| Cash |
| Value | Count | Frequency (%) | |
| Card | 215504 | 60.0% | |
| Cash | 143888 | 40.0% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 4 |
| Min length | 4 |
Gender
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.7 MiB |
| Male | |
|---|---|
| Female |
| Value | Count | Frequency (%) | |
| Male | 205912 | 57.3% | |
| Female | 153480 | 42.7% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 6 |
|---|---|
| Median length | 4 |
| Mean length | 4.854109162 |
| Min length | 4 |
Age
Real number (ℝ≥0)
| Distinct | 48 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35.33670477 |
|---|---|
| Minimum | 18 |
| Maximum | 65 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.7 MiB |
Quantile statistics
| Minimum | 18 |
|---|---|
| 5-th percentile | 19 |
| Q1 | 25 |
| median | 33 |
| Q3 | 42 |
| 95-th percentile | 61 |
| Maximum | 65 |
| Range | 47 |
| Interquartile range (IQR) | 17 |
Descriptive statistics
| Standard deviation | 12.59423447 |
|---|---|
| Coefficient of variation (CV) | 0.3564065906 |
| Kurtosis | -0.4583967778 |
| Mean | 35.33670477 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 0.6853387826 |
| Sum | 12699729 |
| Variance | 158.6147419 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=48)
| Value | Count | Frequency (%) | |
| 23 | 12327 | 3.4% | |
| 20 | 12229 | 3.4% | |
| 27 | 12030 | 3.3% | |
| 25 | 11973 | 3.3% | |
| 32 | 11959 | 3.3% | |
| 34 | 11825 | 3.3% | |
| 39 | 11798 | 3.3% | |
| 22 | 11796 | 3.3% | |
| 26 | 11655 | 3.2% | |
| 19 | 11591 | 3.2% | |
| Other values (38) | 240209 | 66.8% |
| Value | Count | Frequency (%) | |
| 18 | 10846 | 3.0% | |
| 19 | 11591 | 3.2% | |
| 20 | 12229 | 3.4% | |
| 21 | 11431 | 3.2% | |
| 22 | 11796 | 3.3% |
| Value | Count | Frequency (%) | |
| 65 | 3379 | 0.9% | |
| 64 | 3908 | 1.1% | |
| 63 | 3733 | 1.0% | |
| 62 | 3530 | 1.0% | |
| 61 | 4361 | 1.2% |
Income (USD/Month)
Real number (ℝ≥0)
| Distinct | 22725 |
|---|---|
| Distinct (%) | 6.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15048.82294 |
|---|---|
| Minimum | 2000 |
| Maximum | 35000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.7 MiB |
Quantile statistics
| Minimum | 2000 |
|---|---|
| 5-th percentile | 3245 |
| Q1 | 8424 |
| median | 14685 |
| Q3 | 21035 |
| 95-th percentile | 29659 |
| Maximum | 35000 |
| Range | 33000 |
| Interquartile range (IQR) | 12611 |
Descriptive statistics
| Standard deviation | 7969.409482 |
|---|---|
| Coefficient of variation (CV) | 0.529570287 |
| Kurtosis | -0.6604857162 |
| Mean | 15048.82294 |
| Median Absolute Deviation (MAD) | 6304 |
| Skewness | 0.3095622398 |
| Sum | 5408426573 |
| Variance | 63511487.49 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 20884 | 134 | < 0.1% | |
| 8899 | 133 | < 0.1% | |
| 22525 | 129 | < 0.1% | |
| 16512 | 121 | < 0.1% | |
| 16137 | 118 | < 0.1% | |
| 9797 | 116 | < 0.1% | |
| 16289 | 116 | < 0.1% | |
| 21045 | 114 | < 0.1% | |
| 8672 | 112 | < 0.1% | |
| 13413 | 111 | < 0.1% | |
| Other values (22715) | 358188 | 99.7% |
| Value | Count | Frequency (%) | |
| 2000 | 9 | < 0.1% | |
| 2001 | 1 | < 0.1% | |
| 2002 | 2 | < 0.1% | |
| 2003 | 8 | < 0.1% | |
| 2004 | 6 | < 0.1% |
| Value | Count | Frequency (%) | |
| 35000 | 1 | < 0.1% | |
| 34996 | 15 | < 0.1% | |
| 34995 | 4 | < 0.1% | |
| 34989 | 30 | < 0.1% | |
| 34985 | 16 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| df_index | Transaction ID | Date of Travel | Company | City | KM Travelled | Price Charged | Cost of Trip | Customer ID | Payment_Mode | Gender | Age | Income (USD/Month) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10422 | 10000845 | 2016-01-02 | Yellow Cab | NEW YORK NY | 17.92 | 561.71 | 253.7472 | 9 | Card | Male | 32 | 21212 |
| 1 | 14242 | 10000961 | 2016-01-02 | Yellow Cab | NEW YORK NY | 19.04 | 634.46 | 253.6128 | 85 | Card | Male | 19 | 19765 |
| 2 | 13252 | 10000929 | 2016-01-02 | Yellow Cab | NEW YORK NY | 37.24 | 1065.31 | 536.2560 | 439 | Cash | Male | 22 | 5494 |
| 3 | 11247 | 10000869 | 2016-01-02 | Yellow Cab | NEW YORK NY | 3.06 | 104.70 | 36.7200 | 475 | Cash | Male | 36 | 9959 |
| 4 | 2025 | 10000145 | 2016-01-02 | Pink Cab | NEW YORK NY | 2.10 | 37.18 | 21.4200 | 502 | Cash | Male | 28 | 15285 |
| 5 | 2162 | 10000149 | 2016-01-02 | Pink Cab | NEW YORK NY | 32.64 | 498.60 | 349.2480 | 533 | Card | Male | 52 | 15974 |
| 6 | 14700 | 10000975 | 2016-01-02 | Yellow Cab | NEW YORK NY | 37.12 | 1238.35 | 507.8016 | 573 | Card | Male | 34 | 2589 |
| 7 | 9564 | 10000818 | 2016-01-02 | Yellow Cab | NEW YORK NY | 27.30 | 810.52 | 343.9800 | 818 | Card | Male | 18 | 8653 |
| 8 | 9348 | 10000812 | 2016-01-02 | Yellow Cab | NEW YORK NY | 5.82 | 171.76 | 76.8240 | 901 | Cash | Male | 36 | 20574 |
| 9 | 10359 | 10000843 | 2016-01-02 | Yellow Cab | NEW YORK NY | 5.45 | 175.87 | 69.3240 | 957 | Card | Male | 61 | 4347 |
Last rows
| df_index | Transaction ID | Date of Travel | Company | City | KM Travelled | Price Charged | Cost of Trip | Customer ID | Payment_Mode | Gender | Age | Income (USD/Month) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 359382 | 177132 | 10434172 | 2018-12-31 | Yellow Cab | BOSTON MA | 4.52 | 69.01 | 62.9184 | 58205 | Card | Male | 41 | 21247 |
| 359383 | 104755 | 10437748 | 2018-12-31 | Yellow Cab | BOSTON MA | 19.80 | 286.36 | 237.6000 | 58809 | Cash | Female | 42 | 19371 |
| 359384 | 307802 | 10434224 | 2018-12-31 | Yellow Cab | BOSTON MA | 30.07 | 446.61 | 433.0080 | 58956 | Card | Male | 39 | 24646 |
| 359385 | 248975 | 10437814 | 2018-12-31 | Yellow Cab | BOSTON MA | 17.10 | 238.07 | 240.0840 | 59185 | Card | Female | 42 | 11396 |
| 359386 | 152491 | 10434288 | 2018-12-31 | Yellow Cab | BOSTON MA | 2.26 | 31.37 | 28.2048 | 59187 | Cash | Female | 52 | 13751 |
| 359387 | 228341 | 10433128 | 2018-12-31 | Pink Cab | BOSTON MA | 29.97 | 390.42 | 317.6820 | 59274 | Card | Female | 25 | 22928 |
| 359388 | 321484 | 10437817 | 2018-12-31 | Yellow Cab | BOSTON MA | 38.85 | 504.11 | 540.7920 | 59494 | Cash | Female | 35 | 17699 |
| 359389 | 306342 | 10433131 | 2018-12-31 | Pink Cab | BOSTON MA | 27.27 | 370.20 | 324.5130 | 59768 | Cash | Female | 25 | 24526 |
| 359390 | 21766 | 10437732 | 2018-12-31 | Yellow Cab | BOSTON MA | 25.30 | 362.48 | 352.1760 | 59925 | Cash | Male | 36 | 24313 |
| 359391 | 164871 | 10436696 | 2018-12-31 | Pink Cab | BOSTON MA | 27.55 | 377.85 | 330.6000 | 60000 | Cash | Female | 27 | 20303 |